-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lexsort benchmark (#2871) #2929
Conversation
let mut converter = RowConverter::new(fields); | ||
let rows = converter.convert_columns(&arrays).unwrap(); | ||
let mut sort: Vec<_> = rows.iter().enumerate().collect(); | ||
sort.sort_unstable_by(|(_, a), (_, b)| a.cmp(b)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a little bit confused as to why lexsort_to_indices can use sort_unstable whilst claiming to be a stable sort, but perhaps I've missed some subtlety. I just do the same thing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we changed lexsort_to_indices to be unstable in the name of performance. As in it shouldn't be claiming to be stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if the lexsort to indices is correctly doing stable sort. However it's possible to use unstable sort for stable sorting as long as you also sort the indexes:
https://rust-lang.github.io/rfcs/1884-unstable-sort.html
Q: Can stable sort be performed using unstable sort?
A: Yes. If we transform [T] into [(T, usize)] by pairing every element with its index, then perform unstable sort, and finally remove indices, the result will be equivalent to stable sort.
.iter() | ||
.map(|i| *i as u32) | ||
.collect::<Vec<u32>>(), | ||
Ok(UInt32Array::from_iter_values( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive by cleanup to avoid an intermediate array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
let mut converter = RowConverter::new(fields); | ||
let rows = converter.convert_columns(&arrays).unwrap(); | ||
let mut sort: Vec<_> = rows.iter().enumerate().collect(); | ||
sort.sort_unstable_by(|(_, a), (_, b)| a.cmp(b)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we changed lexsort_to_indices to be unstable in the name of performance. As in it shouldn't be claiming to be stable
}, | ||
); | ||
|
||
c.bench_function(&format!("lexsort_rows({:?}): {}", columns, len), |b| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.bench_function(&format!("lexsort_rows({:?}): {}", columns, len), |b| { | |
c.bench_function(&format!("RowFormat: lexsort_rows({:?}): {}", columns, len), |b| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an issue with the benchmark names being too long, so going to skip this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classic tradeoff between concision and verboseness 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wouldn't mind if there was some way to stop criterion truncating benchmark names, but I can't find such an option
Co-authored-by: Andrew Lamb <[email protected]>
Benchmark runs are scheduled for baseline = 51d3568 and contender = 1d36bdf. 1d36bdf is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #2781
Rationale for this change
Benchmarks good 😄
What changes are included in this PR?
Adds some benchmarks of the row format, and adds a disclaimer to the lexsort kernels
So sorting using the row format is in the same ballpark or significantly faster, with the performance benefit becoming more stark with more columns
Are there any user-facing changes?
No